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QQ Abstract 

Time-frequency representations such as the spectrogram are commonly used to an- 
Q alyze signals having a time-varying distribution of spectral energy, but the spectro- 

^/^ gram is constrained by an unfortunate tradeoff between resolution in time and fre- 

quency. A method of achieving high-resolution spectral representations has been in- 
, dependently introduced by several parties. The technique has been variously named 

reassignment and remapping, but while the implementations have differed in de- 
^ tails, they are all based on the same theoretical and mathematical foundation. In 

this work, we present a brief history of work on the method wc will call the method of 
time-frequency reassignment, and present a unified mathematical description of the 
technique and its derivation. We will focus on the development of time-frequency 
reassignment in the context of the spectrogram, and conclude with a discussion of 
2 some current applications of the reassigned spectrogram. 
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1 Introduction 



Many signals of interest have a distribution of energy that varies in time and 
frequency. For example, any sound signal having a beginning or an end has an 
energy distribution that varies in time, and most sounds exhibit considerable 
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variation in both time and frequency over their duration. Time-frequency rep- 
resentations are commonly used to analyze or characterize such signals. They 
map the one-dimensional time-domain signal into a two-dimensional function 
of time and frequency. A time- frequency representation describes the variation 
of spectral energy distribution over time, much as a musical score describes 
the variation of musical pitch over time. 

In audio signal analysis, the spectrogram is the most commonly-used time- 
frequency representation, probably because it is well-understood, and immune 
to so-called cross-terms that sometimes make other time-frequency represen- 
tations difficult to interpret. But the windowing operation required in spec- 
trogram computation introduces an unsavory tradeoff between time resolution 
and frequency resolution, so spectrograms provide a time- frequency represen- 
tation that is blurred in time, in frequency, or in both dimensions. 

Time-frequency reassignment is a technique for refocussing time-frequency 
data in a blurred representation like the spectrogram by mapping the data 
to time-frequency coordinates that are nearer to the true region of support 
of the analyzed signal. This method has been presented in other publications. 
In particular, the reader is referred to the excellent tutorial in [1] and to [2] . 
Time-frequency reassignment is gaining popularity, but it is still common to 
see new research conducted using the classical spectrogram that could benefit 
in efficiency or effectiveness from the enhancements afforded by reassignment. 

In this paper, we will consider the application of time-frequency reassignment 
only to spectrogram data, though its effectiveness has also been demonstrated 
in the context of many other representations (see for example the extensive 
discussion of reassigned timc-frcqucncy representations and time-scale repre- 
sentations in [3]). For completeness, and because different formulations are 
popular in different research communities, we begin with a review of time- 
frequency analysis using the spectrogram in Section 2 before introducing the 
theory of time-frequency reassignment in Section 3. In Section 4 we discuss 
the notion of instantaneous frequency and its estimation using frequency re- 
assignment. In Section 5, we introduce the condition called "separability", 
which is crucial to obtain a meaningful time-frequency representation of a 
multicomponent signal. 

Most common applications of the method of time-frequency reassignment are 
discrete (sampled) in time and frequency, so we next turn our attention to 
methods for efficiently estimating or computing reassigned time and frequency 
coordinates in the discrete case. Algorithms for implementing time-frequency 
reassignment have been presented in [1] and [2], but complete derivations of 
the techniques have not been published, to our knowledge, so in Section 6, we 
offer them as a useful starting point for future research in the computation of 
higher-order spectral derivatives. 
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We then consider some applications of the method of reassignment. In Sec- 
tion 7, we discuss improvements in spectrogram readabiUty that can be achieved 
using reassigned data, with specific attention to methods of detecting and re- 
moving noisy or unrehable spectral data. In sound modeling applications, not 
only the spectral energy or magnitude is needed, but also the spectral phase. In 
Section 8, we discuss a high-fidelity additive sound model that is constructed 
from reassigned spectral data, and methods by which the short-time Fourier 
transform phase can be corrected to agree with the with the time-frequency 
estimates obtained by reassignment. Finally, in Section 9, we suggest some 
directions for future research using higher-order spectral derivatives. 



2 The Spectrogram as a Time-Frequency Representation 

One of the best-known time-frequency representations is the spectrogram, de- 
fined as the squared magnitude of the short-time Fourier transform 

S{t,u) = \X{t,uj)\'. (1) 

The short-time Foiuicr transform is defined as a complex function of contin- 
uous time t and radian frequency lu by 

X{t,u) = J x{T)h*{t - T)e-^^^dr (2) 

= J x{r)h{t - T)e-^^^dr (3) 
= M(i,u;)e^''^(*''") (4) 

where h{t) is a finite-length, real-valued window function, (so h{t) = h*{t)), 
M{t, cu) is the magnitude of the short-time Fourier transform, and cu) is 
its phase. 

Often, it is more convenient to compute the time- varying spectrum by shifting 
the input signal, x{t), instead of the window function. This modified transform, 
computed by 

Xt{uj) = J x{r + t)h{-T)e-^'^^dT (5) 
= Mt{u;)e^'t"^'^^ (6) 

is simply the Fourier transform of the shifted and windowed input signal, 
Mt{ijj) is the magnitude of the Fourier transform, and (/)t(a;) is its phase. 

Equations 3 and 5 differ in the range of r, the variable of integration, over 
which the integrand is non-zero. Since h{t) is a finite-duration window func- 
tion, the integrand in Equation 5 is always non-zero over the same range of r. 
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for any t. Thus, the temporal reference of the transform "shdes along" the 
signal, instead of remaining fixed at t = 0, so we can call this transform the 
moving window transform [4], to distinguish it from the short-time Fourier 
transform. 

The fixed range of integration in Equation 5 makes the moving window trans- 
form easy to implement directly in a digital system using a fast Fourier trans- 
form, but the two transforms are, in fact, equivalent, differing only in their 
temporal reference. By a change of variable, t' — t + t, in Equation 5, we can 
show that 

Xt{u;) = J x{T + t)h{-T)e-^'^^dT (7) 
= J x{t')h{t - t')e^^^'-''Ut' (8) 

= e^"* J x{t')h{t - ty-^^'^'dt' (9) 

= e^"*X(t,a;) (10) 
= M(t,a;)e^'['^*+'^(*''^)1 (11) 

so the moving window transform defined by Equation 5 is closely related to 
the short-time Fourier transform. The magnitudes of the two transforms are 
equal, and the phases differ only by a linear frequency term, that is, 

Mt{uj)^ M{t,uj) (12) 
(l)t{u;)^u;t + (f){t,u;) (13) 



For a sinusoid having constant frequency, cuo, the phase of the short-time 
Fourier transform evaluated at that frequency, (l){t,u!o), is constant for all 
time, and equal to the phase of the sinusoid at t = 0, whereas 0t(cJo), the 
phase of the moving window transform evalutated at frequency ujo, rotates at 
exactly the frequency of the sinusoid. 

Though the short-time phase spectrum is known to contain important tem- 
poral information about the signal, this information is difficult to interpret, 
so typically, only the short-time magnitude spectrum is considered in the con- 
struction of a time-frequency representation like the spectrogram. In the con- 
struction of additive sinusoidal sound models, the short-time phase spectrum 
is sometimes used to improve the frequency estimates in the time-frequency 
representation of quasi-harmonic sounds [5], but it is often omitted entirely, 
or used only in unmodified reconstruction, as in the Basic Sinusoidal Model, 
described by McAulay and Quatieri [6]. 

As a time-frequency representation, the spectrogram has relatively poor res- 
olution. Time and frequency resolution are governed by the choice of analysis 
window, h{t), and greater concentration in one domain is accompanied by 
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Fig. 1. Spectrogram of an acoustic bass tone having a sharp pluck and a funda- 
mental frequency of approximately 73.4 Hz. The spectrogram was computed using 
a 112.7 ms Kaiser window with a shaping parameter of 12. The harmonic compo- 
nents are resolved but the sharp attack is smeared by the duration of the analysis 
window. 



greater smearing in the other. This smearing can be seen in Figures 1 and 2, 
which show spectrograms of a single pluck of an acoustic bass. The decaying 
part of the tone (after the pluck) is well-represented by nearly-harmonic sinu- 
soidal components having a fundamental frequency of approximately 73.4 Hz. 
A very short-duration analysis window is needed in order to represent the tem- 
poral structure of the abrupt attack, but any window function short enough 
to provide the necessary temporal precision is much too wide in frequency to 
resolve the harmonic components of the tone (spaced at about 73.4 Hz). In 
Figure 1, the spectrogram has been computed using a 112.7 ms Kaiser win- 
dow with a shaping parameter of 12. The harmonic components are resolved 
but the sharp attack is smeared by the duration of the analysis window. In 
Figure 2, the spectrogram has been computed using a much shorter Kaiser 
window (10 ms). The attack is less distorted but the harmonic structure is 
no longer discernible. The combination of poor resolution and poor precision 
often makes it necessary to use two or more spectrograms, like the two in 
Figures 1 and 2, to analyze an audio signal. 

A time-frequency representation having improved resolution, relative to the 
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Fig. 2. Spectrogram of an acoustic bass tone having a sharp pluck and a fundamental 
frequency of approximately 73.4 Hz. The spectrogram was computed using a 10 ms 
Kaiser window with a shaping parameter of 12. The attack is less distorted than in 
Figure 1 (though some smearing is still evident) but the harmonic structure is no 
longer discernible. 

spectrogram, is the Wigner-Ville distribution 

W^{t,uj)= J x{t + T/2)x*{t-T/2)e-^'^^dT (14) 

which may be interpreted as a short-time Fourier transform with a window 
function that is perfectly matched to the signal. The Wigner-Ville distribution 
is highly-concentrated in time and frequency, but it is also highly nonlinear 
and non-local. Consequently, this distribution is very sensitive to noise, and 
generates cross-components that often mask the components of interest, mak- 
ing it difficult to extract useful information concerning the distribution of 
energy in multi-component signals. 

Cohen's class of bilinear time-frequency representations [7] is a class of 
"smoothed" Wigner-Ville distributions, defined 



C^{t,uj) = JJ W^{t,u)<^{t -t,u -uj)dTdu (15) 

where $(t,cj) is a smoothing kernel that can reduce sensitivity to noise and 
suppresses cross-components, at the expense of smearing the distribution in 
time and frequency. This smearing causes the distribution to be non-zero in 
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regions where the true Wigner-Ville distribution shows no energy. Cx{t, uj) can 
be considered an average over a domain centered at the point t, u of the vahies 
of the Wigner-Ville distribtuion at neighboring points t — t,u! — fl weighted 
by the value of the smoothing kernel $(r, O). 

The spectrogram in Equation 1 is a member of Cohen's class. It is a smoothed 
Wigner-Ville distribution with the smoothing kernel equal to the Wigner-Ville 
distribution of the window function h{t), that is, 



The method of reassignment smoothes the Wigner-Ville distribution, but then 
refocuses the distribution back to the true regions of support of the signal com- 
ponents. The method has been shown to reduce time and frequency smearing 
of any member of Cohen's class [3], but we will focus here on its application 
to the spectrogram, and by extension, to short-time Fourier analysis of time- 
varying audio signals. In the case of the reassigned spectrogram, the short-time 
phase spectrum, (f)(t, u), is used to correct the nominal time and frequency co- 
ordinates of the spectral data, and map it back nearer to the true regions of 
support of the analyzed signal. 



3 The Method of Reassignment 

Pioneering work on the method of reassignment was first published by Kodera, 
Gendrin, and de Villedary under the name of Modified Moving Window 
Method [4]. Their technique enhances the resolution in time and frequency 
of the classical Moving Window Method, the time-frequency representation 
constructed from the squared magnitude of the moving window transform de- 
fined in Equation 5, by assigning to each data point a new time-frequency 
coordinate that better-refiects the distribution of energy in the analyzed sig- 
nal. 

In the classical moving window method, a time-domain signal, x{t) is decom- 
posed into a set of coefficients, e{t,u!), based on a a set of elementary signals, 
hu,{t), defined 



where h{t) is a (real-valued) lowpass kernel function, like the window function 



<^{t,uj) = Wh{t,uj). 



(16) 



h^{t) = /i(i)e^'^* 



(17) 
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in Equation 3. The coefficients in this decomposition are defined 

e{t,uj) = J x{T)h{t - T)e-J'^l"-*l(iT (18) 

= e^'"* J x{T)h{t - T)e-^'^^dT (19) 

^e^''^X{t,cu) (20) 
= Xticu) (21) 

In other words, the coefficients in the moving window method are computed 
from the moving window transform defined in Equation 5. In the moving 
window method, the time-frequency representation is constructed from the 
squared magnitude of the coefficients, and since the magnitude of these co- 
efficients is identical to the magnitude of the short-time Fourier transform 
coefficients (see Equation 12), this time-frequency representation is exactly 
equivalent to the spectrogram. 

x{t) can be reconstructed from the moving window coefficients by 

x{t) = jj Xr{cu)hl{T - t)dujdT (22) 

^ jj X^{uj)h{T - t)e-^''^''-^^dujdT (23) 

Af,(cj)e-'''^-(")/i(r - t)e-^^^^-'^dujdT (24) 

M^{uj)h{T - i)e^['^-(^)-'"^+'"*lciu;(iT (25) 
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For signals having magnitude spectra, M{t,u!), whose time variation is slow 
relative to the phase variation, the maximum contribution to the reconstruc- 
tion integral comes from the vicinity of the point t,u! satisfying the phase 
stationarity condition 

d 

— [(j)r{uj)-UJT + Ujt]=0 (26) 

d 

— [(l)r{u;)-u;r + u;t]^0 (27) 

or equivalently, around the point i, Cj defined by 

t(r,^) = r-^ = -^%^ (28) 
au) ouj 

u;{r,u;)^^^^u;+ . (29) 



This phenomenon has long been known in such fields as optics as the principle 
of stationary phase (see for example [8]). The principle of stationary phase 
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states that for periodic or quasi-periodic signals, signals that arc concentrated 
in frequency, the variation of the Fourier phase spectrum not attributable to 
periodic oscillation is slow with respect to time in the vicinity of the frequency 
of oscillation, and in surrounding regions the variation is relatively rapid. 
Analogously, for impulsive signals, that are concentrated in time, the variation 
of the phase spectrum is slow with respect to frequency near the time of the 
impulse, and in surrounding regions the variation is relatively rapid. 

In sinusoidal reconstruction, positive and negative contributions to the synthe- 
sized waveform cancel, due to destructive interference, in frequency regions of 
rapid phase variation. Only regions of slow phase variation (stationary phase) 
will contribute significantly to the reconstruction, and the maximum contribu- 
tion (center of gravity) occurs at the point where the phase is changing most 
slowly with respect to time and frequency. 

The time-frequency coordinates computed by Equation 28 and Equation 29 
are the local group delay, tg{t,u!), and local instantaneous frequency, LUi{t,u!), 
and are computed from the phase of the short-time Fourier transform, which 
is normally ignored when constructing the spectrogram, though it is known to 
contain significant information about the signal. These quantities are "local" 
in the sense that they are represent a windowed and filtered signal that is 
localized in time and frequency, and are not global properties of the signal 
under analysis. 

The modified moving window method changes (reassigns) the point of attri- 
bution of e(t,uj) to this point of maximum contribution t{t,uj),uj{t,uj), rather 
than to the point t, ui at which it is computed. This point is sometimes called 
the "center of gravity" of the distribution, by way of analogy to a mass distri- 
bution (in fact, Kodera et al. demonstrated that the coordinates t{t, a;), a)(t, u) 
represent the center of gravity of Rihaczek's complex energy distribution [9] 
for a real signal filtered by the short-time Fourier transform). This analogy is a 
useful reminder that the attribution of spectral energy to the center of gravity 
of its distribution only makes sense when there is energy to attribute, so the 
method of reassignment has no meaning at points where the spectrogram is 
zero- valued. 



4 Local Estimation of Instantaneous Frequency 

The group delay, defined in Equation 28, is often interpreted as the time delay, 

or average time, associated with a particular frequency, and its adoption as 
the reassigned temporal coordinate is consistent with that interpretation. In 
fact, it can easily be shown that the group delay (or equivalently, the time 
reassignment operation) exactly predicts the time of an impulse that lies in 
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the region of support of the analysis window h{t). 

The notion of instantaneous frequency also has a long history in the signal 
processing literature, but it is normally computed from the signalphase, rather 
than the spectral phase. Specifically, when a signal is expressed in analytic 
form, 

xit) = A(t)e^'^W (30) 

where A{t) is the (real) amplitude envelope and 9{t) is the (real) phase func- 
tion, then the instantaneous frequency is defined as the derivative with respect 
to frequency of the signal phase, 9{t), that is 

^ f (31) 



While it is clearly possible to obtain a single-component analytic representa- 
tion of any signal (or, indeed, an infinite number of such representations), such 
a representation is not intuitively satisfying for most audio signals. Pitched 
sounds, such as musical instrument tones, are characterized by quasi-harmonic 
spectra, and a representation as a sum of components representing the vari- 
ous harmonic partials is more revealing and intuitive. Vocal sounds arc often 
analyzed as the response of a resonant system (the vocal tract) to some excita- 
tion signal, and a multicomponent representation that identifies the formants 
of the resonant system is more informative. 

A crucial insight in the development of the method of frequency reassignment 
follows from the interpretation of the short-time Fourier transform as a de- 
modulated bank of linear time-invariant bandpass filters, wherein each filter, 
hu){t), has an impulse response determined by 

h^{t) = h{t)e^''' (32) 

Since the window function, h{t) is the impulse response of a finite impulse 
response lowpass filter, the modulated window function is the impulse re- 
sponse of a finite impulse response bandpass filter with passband centered at 
frequency cu. Since h{t) is real, hij{t) is complex and, therefore, describes a fil- 
ter having an assymmetric frequency response. An analogous bandpass filter 
having a real impulse response would pass frequencies around ±a;, but hi^{t) 
passes only frequencies around cu. 

In the filterbank interpretation of the short-time Fourier transform, X{t,Lv) 
describes the output of a bank of such filters excited by the input signal, x{t). 
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That is, 



X{t,Lu) = J x{T)h{t - T)e-^'^^dT (33) 

= j x{T)h{t - T)e^'^[*-^]e-^'*^*dT (34) 

= e-^''"* J x{T)K{t - T)dT (35) 
= e-^^'[x{t)*K{t)] (36) 

In this interpretation, X(t,C(Jo) describes the demodulated output of a single 
bandpass filter, centered at frequency o^o- The output of the filter is considered 
to be a single complex exponential having magnitude M(i, a;o) and a phase 
composed of a linear component, due to sinusoidal oscillation at frequency ojq, 
and another time- varying component, (f){t^ujQ), accounting for the deviation 
from pure sinusoidal oscillation at frequency loq. X(t,ujQ) is the output of the 
filter demodulated to remove the sinusoidal oscillation at frequency uq, and 
the output of the moving window transform Xt{u;o) — X{t, 6<;o)e^'*'°*, is the raw 
output of the bandpass filter centered at frequency cuq. 



Kodera et al. [4] showed that the local instantaneous frequency, computed from 
the derivative with respect to time of the spectral phase, (j){t,Lj), is equal to 
the instantaneous frequency of the bandpass filtered signal that is the output 
of the short-time Fourier transform at the coordinates t,u;. For a signal 

x{t) = A{t)e^^^'^ (37) 

having instantaneous frequency 

<t) = (38) 

X{t,u;o)e^'^°* is the output of the bandpass filter /ia;o(t), centered at frequency 
ouq, when the input is x(t). If, at time t, the instantaneous frequency of the 
input, cjj(t) is far from ujq, such that the instantaneous frequency of the input 
is outside the passband of the filter centered at ujo, then the output of the 
filter is essentially zero. If, on the other hand, (jj{t), is within the passband of 
the filter (near 6i;o), then the signal passes through the filter unaltered except 
for a scale factor, equal to the passband gain, and a constant time delay, so 
its instantaneous frequency can be computed from the phase of the response 
of hi^oit) or any filter that passes x{t). 



The filters that comprise the short-time Fourier transform introduce only a 

constant offset to the phase of any component that they pass, and the spectral 
phase obtained from the response of a single short-time Fourier transform filter 
differs only by a constant offset from the phase of a single component whose 
frequency hes in the passband of that filter. Therefore, the derivative with 
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respect to time of the filtered signal, Xt{uj), is equal to the derivative with 
respect to time of the original signal, x{t), and so the instantaneous frequency 
of that component can be computed from the phase of the short-time Fourier 
transform evaluated at uj. That is, 

^i(t) = ^^^Tg{x(t)} (39) 

= ^arg{X,(a;)} (40) 

= ^arg{e^-*X(t,a;)} (41) 

= + ^ (42) 
= cD(i,a;) (43) 

for any uj such that hi_j{t) is the impulse response of a filter that passes x{t). 

The short-time Fourier transform filters may also introduce a time delay (if, 
for example, the frequency of the signal under analysis is not exactly equal to 
the center frequency of the filter), but this delay is precisely the group delay. 
Therefore, the local instantaneous frequency computed from the derivative 
with respect to time of the spectral phase is equal to the instantaneous fre- 
quency of the signal at a time offset from the center of the analysis window 
by the group delay, computed from the derivative with respect to frequency 
of the spectral phase. The short-time Fourier transform coefficients evaluated 
at time t and frequency u are mapped from the geometric center of the anal- 
ysis window {tjCu) onto the region of support of the analyzed signal by the 
reassignment operations in Equation 28 and Equation 29. 

The term local instantaneous frequency indicates that u{t,u!) is the instan- 
taneous frequency of the dominant component at a particular time and fre- 
quency. Nelson calls it the channelized instantaneous frequency [10,11], to em- 
phasize that it is the instantaneous frequency of a component passing through 
a single short-time Fourier transform channel. Flanagan also showed that in- 
stantaneous frequency could be computed for each short-time Fourier trans- 
form bin from partial derivatives of the phase spectrum [12], and his method 
has been widely used for obtaining estimates of fundamental frequency (see 
for example [13] and [14]). 

Figure 3 demonstrates the effect of frequency reassignment for a fragment 
of voiced speech. The upper plot shows the conventional (dashed lines) and 
reassigned (crosses) magnitude spectra. The lower plot shows the mapping 
of nominal (Fourier transform bin) frequency to reassigned frequency for the 
same fragment of speech. Near the frequencies of strong harmonics, the map- 
ping is flat, as all nearby transform data is reassigned to the frequency of 
the dominant harmonic component. This consensus, or clustering among re- 
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DFT bin ffsquency <Hi) 

Fig. 3. Demonstration of frequency reassignment in a single spectrum for a fragment 
of speech (the "o" in "open" ) , computed using a 33.6 ms Kaiser analysis window with 
a shaping parameter of 12. The upper plot shows the conventional (dashed lines) 
and reassigned (crosses) magnitude spectra. The lower plot shows the mapping of 
nominal (Fourier transform bin) frequency to reassigned frequency for the same 
fragment of speech. The flat portions of the lower curve represent regions in which 
the energy in many transform bins is reassigned to the same frequency. The circled 
points show the samples having locally minimal frequency reassignments. 

assigned frequency estimates in the vicinity of spectral peaks can be used as 
an indicator of the reliability of the time-frequency data [15]. If the reassigned 
frequencies for neighboring short-time Fourier transform channels are all very 
similar, then there is said to be a high degree of consensus and the quality of 
the frequency estimates is assumed to be good. 

The method of time-frequency reassignment has been used in a variety of 
applications for obtaining improved time and frequency estimates for time- 
varying spectral data [16,17,18]. Often, only the frequency reassignment oper- 
ation is used to compute instantaneous frequency estimates [13,15]. It should 
be noted that, for many choices of analysis window, the much simpler method 
of parabolic interpolation of the magnitude spectrum, proposed by Smith and 
Scrra [19] gives very similar frequency estimates (and in some cases, even more 
precise estimates, according to [20]). We are, however, aware of no compet- 
ing method for improving the accuracy of the time estimates in short-time 
spectral analysis. 
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5 Separability 



The short-time Fourier transform can often be used to estimate the amphtudes 
and phases of the individual components in a multi- component signal, such as 
a quasi-harmonic musical instrument tone (see, for example, [5]). Moreover, 
the time and frequency reassignment operations described by Equation 28 
and Equation 29 can be used to sharpen the representation by attributing the 
spectral energy reported by the short-time Fourier transform to the point that 
is the local center of gravity of the complex energy distribution [18]. 

For a signal consisting of a single component, described by Equation 37, the 
instantaneous frequency can be estimated from the partial derivatives of phase 
of any short-time Fourier transform channel that passes the component, as 
shown above. If the signal is to be decomposed into many components, 

x{t)^J2Mt)e''-^'^ (44) 

n 

and the instantaneous frequency of each component is defined as the derivative 
of its phase with respect to time, that is, 

Mt) = (45) 

then the instantaneous frequency of each individual component can be com- 
puted from the phase of the response of a filter that passes that component, 
provided that no more than one component lies in the passband of the filter. 

This is the property, in the frequency domain, that Nelson called "separa- 
bilty" [10,11] and we require this property of all the signals we analyze. If 

this property is not met, then we cannot achieve the desired multicomponent 
decomposition, because we can not estimate the parameters of individual com- 
ponents from the short-time Fourier transform and we must choose a different 
analysis window so that the separability criterion is satisfied. 

If the components of a signal are separable in frequency with respect to a 
particular short-time spectral analysis window, then the output of each short- 
time Fourier transform filter is a filtered version of, at most, a single dominant 
(having significant energy) component, and so the derivative, with respect to 
time, of the phase of the X{t, ujo) is equal to the derivative with respect to time, 
of the phase of the dominant component at ujq. Therefore, if a component, 
Xn{t), having instantaneous frequency a;„(t) is the dominant component in 
the vicinity of ujq, then the instantaneous frequency of that component can be 
computed from the phase of the short-time Fourier transformevaluated at cjo. 
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That is, 



Un{t) 



d_ 

di 
d_ 

di 



arg{x„(t)} 



axg{X{t,ujQ)} 



(46) 



(47) 



Thus, the partial derivative with respect to time of the phase of the short- 
time Fourier transform can be used to compute the instantaneous frequencies 
of the individual components in a signal described by Equation 44, provided 
only that the components are separable in frequency by the chosen analysis 
window. 

Just as we require that each bandpass filter in the short-time Fourier transform 
filterbank pass at most a single complex exponential component, we require 
that two temporal events be sufficiently separated in time that they do not 
lie in the same windowed segment of the input signal. This is the property of 
separability in the time domain, and is equivalent to requiring that the time 
between two events be greater than the length of the impulse response of the 
short-time Fourier transform filters, the span of non-zero samples in h{t). 

Separability in time and in frequency is required of components we wish to 
resolve in a reassigned time-frequency representation. If the components in a 
decomposition are separable in time and frequency in a certain time-frequency 

representation, then the components can be resolved by that time-frequency 
representation, and using the method of reassignment, can be characterized 
with much greater precision than is possible using classical methods. 

For any signal, there are an infinite number of decompositions of the form 
given in Equation 44. The separability property must be considered in the 
context of the desired decomposition. Figure 4 shows a reassigned spectrogram 
of a speech signal computed using an analysis window that is long relative to 
the time between glottal pulses. The harmonics are clearly visible, and the 
formant frequencies can also be discerned, but the individual glottal pulses 
are smeared in this representation, because many pulses are covered by each 
analysis window (that is, the individual pulses are not separable, in time, 
by the chosen analysis window). Figure 5 shows a spectrogram of the same 
speech signal computed using an analysis window that is much shorter than 
the time between glottal pulses. In this representation, the individual pulses 
are clearly visible, because no window spans more than one pulse, but the 
harmonic frequencies are not visible, because the main lobe of the analysis 
window spectrum is much wider than the spacing between the harmonics 
(that is, the harmonics are not separable, in frequency, by the chosen analysis 
window) . 
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Fig. 4. Long-window reassigned spectrogram of the word "open", computed using 
a 54.4 ms Kaiser window with a shaping parameter of 9, emphasizing harmonics. 
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Fig. 5. Short-window reassigned spectrogram of the word "open", computed using 
a 13.6 ms Kaiser window with a shaping parameter of 9, emphasizing formants and 
glottal pulses. 



16 



6 Efficient Computation of Reassigned Times and Frequencies 

In digital signal processing, it is most common to sample the time and fre- 
quency domains. The discrete Fourier transform is used to compute samples 
X{k) of the Fourier transform from samples x{n) of a time domain signal, 

N-l ^ 

X{k) = ^ x{m)e-^^ (48) 

m=0 

where x{n) is the time domain signal sampled at t„ = nT for sampling pe- 
riod T, X{k) is the discrete Fourier transform coefficients, equal (for adequately- 
sampled bandlimited signals) to samples of the Fourier transform at (radian) 
frequencies uJk = 2iTk/N. For highly-composite N, such as powers of two, the 
discrete Fourier transform can be computed very efficiently using a fast Fourier 
transform algorithm. 

The discrete short-time Fourier transform can be computed 

n 

X{n,k) = x{m)h{n — m)e (49) 

m=n-N+l 
N-l 

— e + n)n{—m)e ^ (oUj 

m=0 

j27vkn , , , . 

= e—X^{k) (51) 

where h{n) is samples of a real- valued, finite-length window function that is 
non-zero only on the range n = . . . — 1, and is analogous to the analy- 
sis window h{t) in Equation 3. Xn{k) is the discrete Fourier transform of a 
shifted and windowed input signal, the discrete time moving window trans- 
form defined in Equation 5. For a signal sampled with sampling period T 
seconds, the A^-point discrete short-time Fourier transform computes samples 
of the short-time Fourier transform at times t„ = nT and (radian) frequencies 
ook = 2'Kk/N. 

The reassignment operations proposed by Kodcra et al. cannot be applied 
to the discrete short-time Fourier transform data, because partial derivatives 
cannot be computed directly on data that is discrete in time and frequency. It 
has been suggested that this difficulty has been the primary barrier to wider 
use of the method of reassignment [21]. 

6. 1 Approximation of the Partial Derivatives of Phase 

It is possible to approximate the partial derivatives using finite differences. 
For example, the phase spectrum can be evaluated at two nearby times, and 
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the partial derivative with respect to time be approximated as the difference 
between the two values divided by the time difference, as in: 



d(f){t,U!) 

dt 
d(t){t,uj) 



1 

Xt 
1 



0(t,a;+— ) 



(j){t, UJ 



(52) 
(53) 



For sufficiently small values of At and Acu, this finite-difference method yields 
good approximations to the partial derivatives of phase, because in regions of 
the spectrum in which the evolution of the phase is dominated by rotation due 
to sinusoidal oscillation of a single, nearby component, the phase is a linear 
function. 

But the phase of the Fourier transform is the argument of a complex quan- 
tity, and can only be computed modulo 27r. This phase wrapping effect, that 
equates all phases differing by a multiple of 27r, has no practical impact on 
the individual phase values, but is of some consequence when two phases 
are combined as in the finite difference derivative approximation, because the 
difference between two phase values may not be preserved by the wrapping 
process. 

Fortunately, At and Aa;, can be chosen to be small enough that, for a properly- 
sampled signal, the phase difference can be easily "unwrapped" . The absolute 
value of the difference in phase between consecutive samples, in time or in 
frequency, of the short-time Fourier transform cannot exceed vr in any region 
of the spectrum dominated by a significant oscillating component. Any phase 
difference that exceeds n, in the absolute sense, has been corrupted by phase 
wrapping, and can be corrected by simply adding or subtracting 27r to obtain 
an absolute vahie that is less than tt. Thus, if At and Aa; are chosen to be one 
sample, then the finite difference method can be used to accurately-estimate 
the reassigned times and frequencies in regions of the spectrum in which the 
evolution of the phase is dominated by rotation due to sinusoidal oscillation 
of a nearby component. 

Charpentier [22] used a finite difference approximation to Flanagan's instan- 
taneous frequency equation [12] and showed that the approximation could 
be computed using only a single Fourier transform for the case of the Hann 
analysis window. 

Unaware of the work of Kodera et ai, Nelson arrived at a similar method for 
computing reassigned time-frequency coordinates for short-time spectral data 
from partial derivatives of the short-time phase spectrum [10,11]. Instead of 
directly computing the first differences of phase. Nelson first computes two 
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so-called cross spectral surfaces, 

C{t,u;) = X{t+^,u;)X*{t - ^,00) (54) 
L{t,u;) = X{t,u;+ ^)X*it,u; - ^) (55) 

The partial derivatives are then approximated by the phase of these cross 
spectra. 

d(l){t,uj) 1 

At ^'■S(*^(^' '^)) (^^) 



It is easily shown that approximation of the derivatives by means of a cross 
spectral surface is equivalent to computing the finite differences directly, only 
the differences are unwrapped automatically when the argument is computed, 
for example: 



At 



arg(C(t,^)) = ^arg(X(t + ^,a;)X*(t-^,^) 



At 
1 

At 



(58) 



arg M(t + — ,a;)e- 



2 



- arg (^M(t + — , a;)M(t - — , a;)e- 



(59) 



jU(t+^,a;)-0(t-f ,a;) 



At 
1 

At 



At , At 



(60) 
(61) 



While these linear differences only approximate the partial derivatives of the 
phase with respect to time and frequency, they give very good approximations 
in regions of the spectrum dominated by a single, significant concentration of 
energy, such as an impulse or a sinusoid, because in these regions the evolution 
of the phase spectrum is linear in time and in frequency. In regions of the 
spectrum having no significant concentration of energy, the finite difference 
approximation may not be a good one, but it makes little sense to compute 
reassigned time and frequency coordinates when there is no energy to reassign. 



6.2 Evaluation of the Partial Derivatives of Phase Using Transforms 



Auger and Flandrin [3] showed how the method of reassignment, proposed in 
the context of the spectrogram by Kodera et al., could be extended to any 
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member of Cohen's class of time- frequency representations by generalizing the 
reassignment operations in Equation 28 and Equation 29 to 



t{t,Uj) = t 



II T ■ Wx{t -T,uj -u) ■ $(r, h')dTdv 
IIWx{t -T,uj -v) ■ $(t, v)dTdi' 
II ^ • Wx{t -T,uj -v) ■ $(t, u)dTdu 
IIWx{t -T,uj -v) ■ $(r, u)dTdu 



(62) 



(63) 



where Wx{t,u!) is the Wigner-Ville distribution of x{t), and $(t, a;) is the 
kernel function that defines the distribution. They further described an ef- 
ficient method for computing the times and frequencies for the reassigned 
spectrogram efficiently and accurately without explicitly computing the par- 
tial derivatives of phase. 

In the case of the spectrogram, Sx{t,u;) — \X{t,u;)\'^, the reassignment opera- 
tions in Equation 28 and Equation 29 can be computed by 



where X{t,u) is the short-time Fourier transform computed using an analysis 
window h{t), Xrhii-,'^) is the short-time Fourier transform computed using a 
time- weighted anlaysis window hr{t) — t ■ h{t) and Xx>h{t, ou) is the short-time 
Fourier transform computed using a time-derivative analysis window hx>{t) — 



In this method, using the auxiliary window functions hr{t) and hvit), the 
reassignment operations can be computed at any time-frequency coordinate 
t,uj from an algebraic combination of the values of three Fourier transforms 
evaluated at t,u;, without directly evaluating or approximating the partial 
derivatives of phase. A method of computing instantaneous frequency equiv- 
alent to Equation 65 was independently discovered by Abe [23], and is some- 
times used in fundamental frequency estimation (see for example [13]). Since 
the these algorithms operate only on spectral data evaluated at a single time 
and frequency, and do not explicitly compute any derivatives, they can easily 
be implemented in digital systems using discrete times and frequencies. 

The time-weighted window function, hrif), is trivially computed by pointwise 
multiplication of the original window function, /i(t), by a time ramp. If the 
derivative of the window function is unknown, then /^©(t) can also be computed 
numerically. The derivative theorem for Fourier transforms, which states that 
if X{uj) is the Fourier transform of x{t), then the Fourier transform of J^a;(t) 




(64) 



(65) 
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is ju!X{u!). That is, 

x{t) ^ X{u;) (66) 

implies 

j^x{t) ^ jujX{uj) (67) 

We can therefore construct the time-derivative window used in the evaluation 
of the frequency reassignment operator by computing the Fourier transform 
of h{t), multiplying by juj, and inverting the Fourier transform. That is, 

jhit) = ¥¥T-'{ju^H{u^)} (68) 

= -$5|FFT-i{u;if(u;)}| (69) 

and so, in discrete time, 

h,,{n)^-^[¥¥T-'{^H{k)}] (70) 



The auxiliary short-time analysis windows employed in the computation of 
Auger and Flandrin's reassignment operations are shown in Figure 6 for the 
case of h{n) being 501 samples at 44.1 kHz of a Kaiser window with shaping 
parameter equal to 12. 

Using Equation 64 and Equation 65, the value of the reassignment operations 
at a; can be computed from the values of the three short-time Fourier trans- 
forms at t, uj, without direct evaluation of any partial derivatives. So if the 
values of the transforms at discrete time and frequency coordinates, 
are known, then the values of the reassignment operations can be computed 
for those discrete times and frequencies without resorting to discrete approx- 
imations to the partial derivatives in Equation 28 and Equation 29. Since the 
discrete short-time Fourier transform computes the values of the short-time 
Fourier transform at discrete times and frequencies, values of the reassign- 
ment operations, ujitmUJk) and t{tn,u!k) can be computed from three discrete 
short-time Fourier transforms. This gives an efficient method of computing 
the reassigned discrete short-time Fourier transform provided only that the 
|X(i, a;)p is non-zero. This is not much of a restriction, since the reassign- 
ment operation itself implies that there is some energy to reassign, and has 
no meaning when the distribution is zero- valued. 

In the two sections that follow, we derive the reassignment operations proposed 
by Auger and Flandrin. The implementation of the reassignment operations 
has been described elsewhere [1,2], and partial derivations have been presented 
(see, for example [3] and [21]), but we are not aware of a complete, published 
derivation. We include the derivations here for completeness, and because 
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Fig. 6. Representative analysis windows employed in the three short-time transforms 
used to compute reassigned times and frequencies. Waveform (a) is the original 
window function, h{n) (a Kaiser window with shaping parameter 12.0, in this case), 
waveform (b) is the time-weighted window function, hq-{n), and waveform (c) is the 
frequency- weighted window function, hT>{n). In each case, the plots were made from 
501 samples of the corresponding window function at 44.1 kHz. 

we have found them to be a useful starting point for deriving methods of 
computing other spectral derivatives. 

The procedure in each derivation is the same. First, we show that the partial 
derivative of the spectral phase can be expressed in terms of the short-time 
Fourier transform and its partial derivative. Then, we show that the par- 
tial derivative of the short-time Fourier transform can be computed from the 
transform itself and a second transform computed using a different window 
function. Finally, we combine the results to obtain an expression for the reas- 
signed coordinate that does not include any explicit partial derivatives. 

Readers not interested in the mathematical derivation of the reassignment 
operations can skip ahead to Section 7. 
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6.2.1 Derivation of Efficient Spectrogram Time Reassignment Operator 

In this section, we present the mathematical derivation of the efficient time 
reassignment operator discovered by Auger and Flandrin. In Section 6.2.2 we 
will present the derivation of the frequency reassignment operator. We begin 
by restating the time reassignment operation identified by Kodera et. al, and 
presented here in Equation 28 

iit,.) = -^^. 



To arrive at an expression for the partial derivative of spectral phase with 
respect to frequency, we first take the partial derivative of X{t, uj) with respect 
to frequency. Applying the product rule of differential calculus to Equation 4, 



d_ 



X{t,uj) 



d_ 



dM{t,u) 
duj 



+ M{t,u;)-j 



dui 



.d(j){t,uj) 
+ j^^X{t,u) 



(71) 
(72) 
(73) 



In order to isolate the partial derivative of phase, we can multiply by 
jX*(t,u)/\X(t,uj)\'^ (this operation is only valid when |X(i,a;)p is non-zero, 
but as noted earlier, the reassignment operation itself has no meaning when 
the distribution is zero-valued) and simplify to obtain 



'd_ 



X{t,uj) 



jX*{t,u;) 
|X(t,a;)P 



ouj auj 



jX*{t,ij) 



X(t,u;)|2 
(74) 

dMit,u;) jX*it,u) dct>it,u) |X(^,a;)|^ 

du \X(t,u;)\^ duj \X{t,uj)\'^ 

(75) 

dM{t,u) jM{t,uj) d(j){t,uj) 



dco |^(t,a;)|^ 



du 



d4>{t,ijj) 



+ 3 



M{t,uj) 
\X{t,u;)[^ 



dM{t,uj) 
du 



(76) 
(77) 



Since M(t, uo) is real- valued, the real part of this expression is precisely the neg- 
ative partial derivative with respect to frequency of the phase of the short-time 
Fourier transform. Thus, we conclude that the partial derivative of spectral 
phase with respect to frequency, and hence, the reassigned time, t, can be 
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computed by 



i{t, u) 



d4>{t,uj) 



(78) 



Equation 78 expresses the partial derivative of the short-time Fourier trans- 
form phase with respect to frequency in terms of the transform itself and its 
partial derivative with respect to frequency. 

Next, we will show that the partial derivative of the short-time Fourier trans- 
form with respect to frequency can be computed without explicitly computing 
or approximating any derivatives. Taking the partial derivative of the short- 
time Fourier transform given by Equation 3, 



d_ 



X{t,u) = 



d_ 
do; 



x{T)h{t - T)e-^^^dT 



x{T)h{t - t) 
J x{r)h{t-T) 



d_ 
du 



-JU)T 



-J re 



-JUIT 



(79) 

dr (80) 

dr (81) 

-j j x{T)-T-h{t- T)e-^'^^dT (82) 

-jtX{t, uj) + jtX{t, x{t) ■T-h{t- T)e-^'^^dT (83) 

-jtX(t, oj)+j J x(t) -(t-r)- h(t - T)e-^'^^dT (84) 

~jtX{t, (^)+J J x{T)hr{t - T)e-'J'^^dT (85) 

-jtX{t,uj) + jXrh{t,uj) (86) 



where hrit) = t ■ h{t) is the time- weighted anlaysis window described in 
Section 6 and shown in Figure 6, and Xrui^-,^) short-time Fourier 

transform computed using this time-weighted analysis window. Multiplying, 
as before (and with the same caveat concerning zero- valued distributions), by 
jX*{t,uj)/\X{t,uj)\^, we obtain 



d_ 



X{t,uj) 



[-jtX{t,u) + jXruM] 



\X{t,u)\^ 
\X{t,uj)\^ jXrH{t,u)-jX*{t,uj) 
|X(t,a;)P^ W,a;)P 
XrhM-X*{t,u) 



(87) 
(88) 
(89) 
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Substituting this expression into Equation 78, we obtain 

'^''^^^ — duT^'-^X — \x{t;^- — 

which is the time reassignment operation proposed by Auger and Flandrin, 
previously presented in Equation 28. 




6.2.2 Derivation of Efficient Spectrogram Frequency Reassignment Operator 

In this section, we present the mathematical derivation of the efficient fre- 
quency reassignment operator discovered by Auger and Flandrin. We begin 
by restating the frequency reassignment operation identified by Kodera et. al., 
and presented here in Equation 29 

u{t,u;) = a;H — — . 



To arrive at an expression for the partial derivative of spectral phase with 
respect to time, we first take the partial derivative of X{t,uj) with respect to 
time. Applying the product rule of differential calculus to Equation 4, 



d_ 

at 



X{t,uj) 



d r 



dt 

dM{t,uj) 

at 

dM{t,uj) 

m 



(91) 
(92) 
(93) 



In order to isolate the partial derivative of phase, we can multiply by 
X*{t,uj)/\X{t,uj)\'^ (this operation is only valid when |X(t,cj)p is non-zero, 
but as noted earlier, the reassignment operation itself has no meaning when 
the distribution is zero-valued) and simplifying to obtain 



dX{t,uj) X*{t,Lj) 



dt \X{t,u;)[^ 



dM{t,u;) d<P{t,u) ■ 

dt dt ^'^ 



X*{t,u) 

\X{t,uW 
(94) 

dM{t,u;) ,^(,,,) X*{t,u) mt,u^) \X{t,u;)\' 
dt \X{t,uj)\^ dt \X{t,u)\^ 

(95) 

dM{t,u;) M{t,u;) MM .... 

+ J — — (96) 



dt 



\X{t,uj) 



dt 
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Since M(t, u) is real- valued, the imaginary part of this expression is precisely 
the partial derivative with respect to time of the phase of the short-time 
Fourier transform. Thus, we conclude that the value of the frequency reas- 
signment operator can be computed by 

, d<P{t,u;) , J dX{t,u;) X*(t^ 
u){t,uj) = u) -\ ^7 — = a;-Fb-< — — — r-7- > (97) 



dt + 9f ' \X{t,u;) 



Equation 97 expresses the partial derivative of the short-time Fourier trans- 
form phase with respect to time in terms of the transform itself and its partial 
derivative with respect to time. 

Next, we will show that the partial derivative of the short-time Fourier trans- 
form with respect to time can be computed without explicitly computing or 
approximating any derivatives. Taking the partial derivative of the short-time 
Fourier transform given by Equation 3, 

-X{t, ^) = Q-J x{T)h{t - T)e-^--dT (98) 



dt 



e-'^'^dT (99) 

= j x{r)hv{t - T)e-^^^dT (100) 
^Xvh{t,u;) (101) 

where /^©(t) = ^h{t) is the time-derivative anlaysis window described in Sec- 
tion 6 and shown in Figure 6, and Xx>h{t,ui) is the short-time Fourier trans- 
form computed using this time-derivative analysis window. Multiplying, as 
before (and with the same caveat concerning zero- valued distributions), by 
jX*{t,u)/\X{t,u;)\^, we obtain 

dX{t,u;) X*{t,u;) __ Xy^it^u) ■ X*{t,uj) 

dt ' |X(t,a;)|2 \X{t,u;)\^ ^ ' 



Substituting this expression into Equation 97, we obtain 

, d<^{t,u) , ^| Xp^(t,cu)-X*(t,^) | 
c.(t, c.) = + = + ^^^^^ 1 (103) 

which is the frequency reassignment operation proposed by Auger and Flan- 
drin, previously presented in Equation 29. 
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Fig. 7. Reassigned spectrogram for the onset of an acoustic bass tone having a sharp 
pluck and a fundamental frequency of approximately 73.4 Hz. The spectrogram was 
computed using a 65.7 ms Kaiser window with a shaping parameter of 12. The sharp 
attack and the harmonic components are clearly visible. 



7 Pruning Reassigned Data to Improve Spectrogram Readability 



Reassigned time and frequency estimates are mucli more precise tlian tliose ob- 
tained from traditional metliods, so for many applications, the "readability" or 
"interpretability" of the spectral data is much improved by reassignment. The 
resolving power of the reassigned short-time Fourier transform is no greater 
than that of the classical short-time Fourier transform. The separability con- 
ditions discussed in Section 5 apply equally to reassigned and non-reassigned 
spectra, and components smeared together by the analysis window will be 
smeared in both reassigned and non-reassigned spectral data. Provided the 
separability conditions are satisfied, however, reassigned spectrograms offer 
improved clarity in the representation of quasi-sinusoidal components and im- 
proved localization of impulsive events This improved readability is evident 
in the plot in Figure 7, showing a reassigned spectrogram for the same bass 
pluck that was plotted in Figures 1 and 2. The sharp attack is clearly visible 
in this reassigned data, as are the harmonic components. 

In spite of the obvious gains in clarity, reassigned spectrograms can be dis- 
appointingly noisy. Seemingly random speckle is visible in regions where the 
reassigned data is not clearly associated with either a sinusoidal component 
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Fig. 8. Reassigned spectrogram of the vowel e (day) computed using a 7.8 ms Hann 
window. 

or an impulsive component. This random speckle can be seen between the 
harmonics in Figure 7, and in the reassigned spectrogram shown in Figure 8, 
representing a portion of the vowel e (day), spoken in a "creaky" (low airflow) 
voice. The main glottal impulses appear as dark grey lines, and these result 
from the sudden release of a puff of air into the mouth from below the vo- 
cal cords. There are also fainter secondary impulses of unknown cause (these 
have been attributed to the mechanics of the vocal cord action), appearing 
just prior to the main impulses in this particular voice sample. 

Reassignment provides a mapping from the geometrical center of the short- 
time analysis window to the center of gravity of a nearby dominant spectral 
component, but in low-energy regions of the spectrum, where there is no domi- 
nant component, data is reassigned in a way that has no apparent relationship 
to the structure of the analyzed signal. 

The reassignment operations can be used to gauge the quality or reliability of 
spectral analysis data. Large time or frequency reassignments indicate energy 
concentrated far from the geometrical center of the analysis window. Since 
window functions used in spectral analysis emphasize signal energy near their 
geometrical centers and de-emphasize signal energy far from their centers, 
large reassignments indicate data derived primarily from signal features that 
are not well-represented in a particular analysis window. Reassigned spectral 
data judged to be unreliable on the basis of large time reassignments [18] or 
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large frequency reassignment s [15] can be pruned from the representation to 
further improve its readabihty, with the assurance that, owing to the high 
redundancy of short-time Fourier transform data, the unrehable data will be 
represented more reliably in a neighboring short-time Fourier transform frame 
or channel. Gardner [15] further showed that consensus among neighboring 
estimates of reassigned frequency indicates that those frequency estimates are 
reliable, and therefore consensus can be used to guide the choice of an opti- 
mal analysis window for resolving sinusoidal components in spectrally sparse 
sounds. 

When there is a high degree of consensus among reassigned frequency es- 
timates, then the reassigned frequency, uj{t,uj) is changing very little with 
respect to the center frequency of the analysis window, that is 

^ 0. (104) 

Since, from Equation 29, the reassigned frequency is computed from the partial 
derivative of spectral phase with respect to time, consensus among reassigned 
frequency estimates can be evaluated locally from the mixed partial deriva- 
tive of spectral phase with respect to time and frequency. Specifically, in the 
vicinity of quasi-sinusoidal components, all frequencies, uj, should be mapped 
to approximately the same reassigned frequency, cu{t,u!), so 

ouj atauj 



Nelson showed that the mixed partial derivative of spectral phase can be used 
to clean up or "de-speckle" reassigned spectrograms by removing data that 
does not correspond to strongly sinusoidal or impulsive components in the 
analyzed signal [10,11]. By plotting just those points in a reassigned spec- 
trogram meeting the condition on the mixed partial derivative expressed in 
Equation 105, a spectrogram showing just the strongly-sinusoidal components 
can be drawn. In a speech signal, analyzed using a short analysis window, these 
will be chiefly the vocal tract resonances. A reassigned spectrogram pruned in 
this way is shown in Figure 9 for the same speech signal plotted in Figure 8. 

Nelson further demonstrated that impulsive components in a signal should 
be characterized by a high degree of consensus among neighboring reassigned 
time estimates. Near the time of the impulse, the reassigned time, t{t,Lv), 
should be changing very slowly with respect to the temporal center of the 
analysis window, that is, 

%^ « 0. (106) 

Since, from Equation 28, the reassigned time is computed from the partial 
derivative of spectral phase with respect to frequency, consensus among re- 
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Creaky "e" waveform 




Fig. 9. Reassigned spectrogram showing only the sinusoidal components in the vowel 
e (day). The spectrogram was computed using a 7.8 ms Hann window. 

assigned time estimates can also be evaluated locally from the mixed partial 
derivative of spectral phase. In the vicinity of impulsive components, all times, 
t, should be mapped to approximately the same reassigned time, t{t,u!), so 

%)=_^!^«0. (107) 

By plotting just those points in a reassigned spectrogram meeting the condi- 
tion on the mixed partial derivative expressed in Equation 107, a spectrogram 
can be drawn that clearly and precisely localizes the impulsive events in a 
signal. For example, using a short (relative to the fundamental period) analy- 
sis window, the individual glottal pulses in a speech signal can be plotted, as 
shown for the creaky e signal in Figure 10. 

Using the phase of the moving window transform. Nelson identified the reas- 
signed data corresponding to sinusoidal components as the points satisfying 

and the reassigned data corresponding to impulsive components as the points 
satisfying 

"1?- (-) 
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Creaky "e" waveform 
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Fig. 10. Reassigned spectrogram showing only the impulsive components in the 
vowel e (day). The spectrogram was computed using a 7.8 ms Hann window. 

Equations 108 and 109 are exactly equivalent to Equations 105 and 107, re- 
spectively; the apparent shift by one reflects the difference in spectral phase 
reported by the moving window transform and the short-time Fourier trans- 
form. 



By plotting just those points in a reassigned spectrogram meeting the dis- 
junction of these two conditions yields a "denoised" reassigned spectrogram 
showing precisely-locaUzed quasi-sinusoidal and impulsive components, and 
excluding the objectionable "speckle". Figure 11 shows a fully "dcspcckled" 
reassigned spectrogram of the creaky e, constructed solely from reassigned 
spectral data satisfying one of the conditions in Equations 105 and 107. Fig- 
ure 12 shows a similarly "despeckled" reassigned spectrogram for the acoustic 
bass pluck shown in Figure 7. 

Nelson used finite differences to compute the mixed partial derivative of spec- 
tral phase, but using the derivations of the reassignment operations in Sec- 
tions 6.2.1 and 6.2.2, it can be shown the the mixed partial derivative can be 
computed directly from Fourier transforms by 

-^tduT^^X — — r^i — xHtM — j ^ ^ 

where Xrvh(t,uj) is the short-time Fourier transform of x{t) computed using 
a window hrv{t) = ^'^^^{t), that is, the window used to compute Xx>h{t,(x!) 
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Creaky "e" waveform 
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Fig. 11. "Despeckled" reassigned spectrogram showing only the sinusoidal and im- 
pulsive components in the vowel e (day). The spectrogram was computed using a 
7.8 ms Hann window. 



multiplied by a time ramp. 



8 Phase-Correct Additive Sound Modeling 



In many applications, only the reassigned energy distribution is desired. In 
sound modeling applications, where the goal is to construct a model of the 
sound that can be used to reconstruct the sound, possibly with modifications, 
we retain not the squared magnitude of the Fourier transform, the spectro- 
gram, but the magnitude and phase of the transform. 

The reassigned bandwidth-enhanced additive sound model [24] is a high-fidelity 
representation that allows manipulations and transformations to be applied 
to a great variety of sounds, including noisy and non-harmonic sounds. It is 
similar in spirit to traditional sinusoidal models [6,19,25] in that a waveform 
is modeled as a collection of components, called partials, having time-varying 
frequencies and amplitudes. Estimates of partial frequency, amplitude, and 
phase are obtained by following ridges on a reassigned timc-frcquency surface, 
such as the one shown in Figure 13, constructed by reassigning discrete short- 
time Fourier transform data. This algorithm shares with traditional sinusoidal 
methods the notion of temporally-connected partial parameter estimates, but 
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Fig. 12. "Despeckled" reassigned spectrogram showing only the sinusoidal and im- 
pulsive components in the onset of an acoustic bass tone having a sharp pluck and a 
fundamental frequency of approximately 73.4 Hz. The spectrogram was computed 
using a 65.7 ms Kaiser window with a shaping parameter of 12. 



by contrast, the reassigned estimates are non-uniformly distributed in both 
time and frequency. The reassigned bandwidth-enhanced model yields greater 
resolution in time and frequency than is possible using conventional additive 
techniques, and can preserve the temporal envelope of transient signals, even 
in modified reconstruction, if the short-time phase information is properly 
maintained [18]. Preserving phase is important for reproducing transients and 
short-duration complex sounds having significant information in the temporal 
envelope [26]. 

The phase reported by the short-time Fourier transform is referenced to the 
geometrical center of the analysis window in time and frequency. For a sinu- 
soid having instantaneous frequency equal to uji(t) (which is assumed to be 
slowly- varying with respect to time), the argument of the short-time Fourier 
transform evaluated at t,Ui{t) will be precisely the phase of the sinusoid at 
time t. If the short-time Fourier transform is evaluated at some nearby fre- 
quency, u!i{t) + e, then the argument will not be precisely the phase of the 
sinusoid, because the transform is equivalent to the output of a bank of linear 
phase bandpass filters. These filters have phase equal to zero at the center of 
their passbands, so a sinusoid having frequency equal to the center frequency 
of the filter will see no phase shift, but a sinusoid having frequency not equal 
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Fig. 13. Reassigned spectral surface for the onset of an acoustic bass tone having 
a sharp pluck and a fundamental frequency of approximately 73.4 Hz. The spec- 
trogram was computed using a 65.7 ms Kaiser window with a shaping parameter 
of 12. 

to the center frequency of the filter (but still in the passband) will experience 
a linear phase shift. 

Frequency reassignment computes frequencies for discrete short-time Fourier 
transform data that are equal to the instantaneous frequency of the dominant 
component in the signal under analysis at each time and frequency at which 
the transform is evaluated, and attributes the data to these reassigned frequen- 
cies. Since the reassigned data represent energy in signal components having 
frequencies that arc not at the geometrical center of the analysis window, it 
follows that the data is perturbed by a linear phase shift. This perturbation 
is easy to correct, because the slope of the phase response of the transform 
filters is known. In fact, since the phase is linear over the entire passband of 
the filter (this is why the finite difference approximation to the derivative is 
so accurate at points of significant energy, sec section 6.1), the phase of reas- 
signed short-time data can be corrected to agree with the reassigned frequency 
by linear interpolation of the discrete short-time phase spectrum. 

Similarly, data that is reassigned in time away from the geometrical center of 
the analysis window needs to be corrected for the phase travel due to sinu- 
soidal oscillation over the interval of time reassignment (that is, the interval 
between the reassigned time and the temporal center of the analysis window) . 
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In order to account for this phase travel precisely, the frequency trajectory 
must be known precisely. For many sounds, provided that the analysis win- 
dow is not too long, the frequency can be assumed to be constant over the 



t{t,uj) -t 



is sufficient to 



time reassignment interval, so a shift by uj{t,Lj) 
correct the phase of reassigned short-time data to agree with the reassigned 
time {u!{t, cu) is the reassigned instantaneous frequency and t{t, cu) — t is time 
reassignment interval) . 



9 Computation of Higher-Order Phase Derivatives 



The method of reassignment uses the partial derivatives of spectral phase with 
respect to time and frequency. Applications of higher-order partial derivatives 
of spectral phase have been proposed as well. Nelson showed that higher-order 
partial derivatives of phase could be approximated using cross-spectral sur- 
faces [10,11], but did not discuss any applications beyond the estimation of 
frequency slope, or chirp rate. Rihaczek [9] proposed that the second deriva- 
tives of the spectral phase be used to estimate the optimal dimensions of the 
time-frequency cells in a Gabor decomposition. 

By methods similar to those described in Sections 6.2.1 and 6.2.2, it can be 
shown that the second partial derivative of phase with respect to frequency 
and time can be computed 



dmt,Lj) ( fXrh{t,u;)X*{t,ij)Y\ (Xr2h{t,uj)X*{t,uj) 



du^ U \X{t,u;)\^ I 1 \X{t,u 



^ ^ J Xr,2hit,u;)X*it,Lo) \ ^ f XMt,u;)X*it,uj) 



2- 



(111) 

(112) 



where Xj-2h{t,uj) is the short-time Fourier transform of x{t) computed using 
a window hq--2{t) = t^h(t) and Xx)2h[t,uj) is the short-time Fourier transform 
of x{t) computed using a window hx)2{t) = ^h{t). 

We think that higher-order phase derivatives might be useful in computing 
local estimates of time and frequency spread that will allow us to construct 
more robust models of noisy sounds. 



10 Conclusion 



We have presented the theory of time-frequency reassignment in the context 
of the spectrogram, the most commonly-used time-frequency representation 
in speech and audio processing. Time-frequency reassignment sharpens blurry 
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timc-frcquency data by relocating the data according to local estimates of 
instantaneous frequency and group delay. This mapping to reassigned time- 
frequency coordinates is very precise for signals that are separable in time and 
frequency with respect to the analysis window. We have discussed methods for 
computing reassigned times and frequencies in digital systems, and offered a 
derivation of one popular and efficient method. Wc believe that many speech 
and audio processing applications employing short-time spectral analysis could 
benefit from the straightforward application of the method of reassignment, 
and have discussed some examples from our own research. We further believe 
that extensions to the method of reassignment to efficiently compute mixed 
and higher-order spectral derivatives may provide a means of computing other 
features of interest in speech and audio signal processing. 
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